[V5] Return a BatchEncoding dict from apply_chat_template by default again#42567
[V5] Return a BatchEncoding dict from apply_chat_template by default again#42567Rocketknight1 merged 11 commits intomainfrom
Conversation
…nderlying tokenizer
|
[For maintainers] Suggested jobs to run (before merge) run-slow: blenderbot, bloom, cohere, gpt2, gpt_sw3 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
cc @LysandreJik - this was one of the V5 PRs before, do I need to do anything special with this one, or can we just merge it to |
zucchini-nlp
left a comment
There was a problem hiding this comment.
great, it was already approved once so lgtm 😄
| if not tokenize: | ||
| return_dict = False # dicts are only returned by the tokenizer anyway |
There was a problem hiding this comment.
makes me wonder, do we need to support a combination of tokenize=True, return_dict=False or can we deprecate/remove return_dict over time? Can't think of cases when users want a list of tokens as output
There was a problem hiding this comment.
Maybe we can get rid of it over time, but I think it's fine as a backward compatibility flag for now!
There was a problem hiding this comment.
sure, i meant after v5 + several more minor releases, and if users are fine with it
…again (huggingface#42567) * Flip the default return type for `apply_chat_template` to match the underlying tokenizer * Remove test_tokenization_for_chat tests, which no longer do anything useful * Remove test_tokenization_for_chat tests, which no longer do anything useful * Fix test_encode_message tests * Fix test_encode_message tests * nit fix * Trigger tests * Remove test_tokenization_for_chat * make fixup * Add a little test to make sure that doesn't happen again * make fixup
…again (huggingface#42567) * Flip the default return type for `apply_chat_template` to match the underlying tokenizer * Remove test_tokenization_for_chat tests, which no longer do anything useful * Remove test_tokenization_for_chat tests, which no longer do anything useful * Fix test_encode_message tests * Fix test_encode_message tests * nit fix * Trigger tests * Remove test_tokenization_for_chat * make fixup * Add a little test to make sure that doesn't happen again * make fixup
This is basically PR #41626 again! Some of it got clobbered in the tokenizer refactor, but it's just as good the second time.